AITopics | low-rank constraint

Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.

fast inference, low-rank constraint, name change, (6 more...)

Neural Information Processing Systems

Industry:

Media > Music (0.61)
Leisure & Entertainment (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.98)

Add feedback

Field-wise Learning for Multi-field Categorical Data Zhibin Li

Neural Information Processing SystemsOct-3-2025, 05:18:01 GMT

We propose a new method for learning with multi-field categorical data.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Chen, Xi, Feng, Kaituo, Li, Changsheng, Lai, Xunhao, Yue, Xiangyu, Yuan, Ye, Wang, Guoren

arXiv.org Artificial IntelligenceOct-12-2024

Low-rank training has emerged as a promising approach for reducing memory usage in training Large Language Models (LLMs). Previous methods either rely on decomposing weight matrices (e.g., LoRA), or seek to decompose gradient matrices (e.g., GaLore) to ensure reduced memory consumption. However, both of them constrain the training in a low-rank subspace, thus inevitably leading to sub-optimal performance. This raises a question: whether it is possible to consistently preserve the low-rank constraint for memory efficiency, while achieving full-rank training (i.e., training with full-rank gradients of full-rank weights) to avoid inferior outcomes? In this paper, we propose a new plug-and-play training framework for LLMs called Fira, as the first attempt to achieve this goal. First, we observe an interesting phenomenon during LLM training: the scaling impact of adaptive optimizers (e.g., Adam) on the gradient norm remains similar from low-rank to full-rank training. Based on this observation, we propose a norm-based scaling method, which utilizes the scaling impact of low-rank optimizers as substitutes for that of original full-rank optimizers to enable full-rank training. In this way, we can preserve the low-rank constraint in the optimizer while achieving full-rank training for better performance. Moreover, we find that there are sudden gradient rises during the optimization process, potentially causing loss spikes. To address this, we further put forward a norm-growth limiter to smooth the gradient via regulating the relative increase of gradient norms. Extensive experiments on the pre-training and fine-tuning of LLMs show that Fira outperforms both LoRA and GaLore, achieving performance that is comparable to or even better than full-rank training.

fira, full-rank training, gradient, (13 more...)

arXiv.org Artificial Intelligence

2410.01623

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Hebei Province (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Low-Rank Constraints for Fast Inference in Structured Models

Neural Information Processing SystemsOct-9-2024, 15:15:16 GMT

Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.

fast inference, low-rank constraint, structured model, (3 more...)

Neural Information Processing Systems

Industry:

Media > Music (0.65)
Leisure & Entertainment (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling

Shi, Jiawei, Deng, Hui, Dai, Yuchao

arXiv.org Artificial IntelligenceJun-23-2024

Even though Non-rigid Structure-from-Motion (NRSfM) has been extensively studied and great progress has been made, there are still key challenges that hinder their broad real-world applications: 1) the inherent motion/rotation ambiguity requires either explicit camera motion recovery with extra constraint or complex Procrustean Alignment; 2) existing low-rank modeling of the global shape can over-penalize drastic deformations in the 3D shape sequence. This paper proposes to resolve the above issues from a spatial-temporal modeling perspective. First, we propose a novel Temporally-smooth Procrustean Alignment module that estimates 3D deforming shapes and adjusts the camera motion by aligning the 3D shape sequence consecutively. Our new alignment module remedies the requirement of complex reference 3D shape during alignment, which is more conductive to non-isotropic deformation modeling. Second, we propose a spatial-weighted approach to enforce the low-rank constraint adaptively at different locations to accommodate drastic spatially-variant deformation reconstruction better. Our modeling outperform existing low-rank based methods, and extensive experiments across different datasets validate the effectiveness of our method.

constraint, dataset, sequence, (16 more...)

arXiv.org Artificial Intelligence

2405.04309

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Add feedback

Filters

Collaborating Authors

low-rank constraint

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

16c0d78ef6a76b5c247113a4c9514059-Supplemental.pdf

16c0d78ef6a76b5c247113a4c9514059-Paper.pdf

7078971350bcefbc6ec2779c9b84a9bd-Paper.pdf

16c0d78ef6a76b5c247113a4c9514059-Supplemental.pdf

16c0d78ef6a76b5c247113a4c9514059-Paper.pdf

Low-Rank Constraints for Fast Inference in Structured Models

Field-wise Learning for Multi-field Categorical Data Zhibin Li

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Low-Rank Constraints for Fast Inference in Structured Models

Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling